# Measuring Media Criticism with ALC Word Embeddings

This project contains code for the study titled "Measuring Media Criticism with ALC Word Embeddings," which focuses on analyzing and processing data related to media criticism using advanced linguistic computational (ALC) word embeddings. The study is published in the journal *Political Analysis*.

Scripts should be run in order.

The download link for the raw data will be supplied separately only for the purposes of root-and-branch replication. It will not be included in the public replication package.  

## Authors

- Christopher Barrie
- Neil Ketchley
- Alexandra Siegel
- Mossaab Bagdouri

## Setup Instructions for Python scripts

1. Create and activate a virtual environment:
```bash
# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate  # On macOS/Linux
# or
.\venv\Scripts\activate  # On Windows
```

2. Install required dependencies:
```bash
pip install -r requirements.txt
```

## Setup Instructions for R scripts

We use `renv` in this replication package. The `renv.lock` file in the repository captures the exact versions of every package used in the analysis.

To activate, use: 

```r
renv::activate()
```

Then:

```r
renv::restore()
```

## Potential System Dependencies

This project may require some system-level libraries for compiling certain R packages (e.g., `rsparse` depends on gfortran).

### macOS

Install GCC (which includes gfortran) using Homebrew:
```bash
brew install gcc
```
```markdown

## Execute the Master Script

From the root directory of the repository, make sure that `run.sh` is executable:

```bash
chmod +x run.sh
```

Then run the script:

```bash
./run.sh
```

This will execute all the analysis scripts in order. All output is directed to a log file named `run.log`. In the interest of expediency,
the log files only include the main analyses so far. When run again in full, it will include the initial pre-processing steps etc. Given that these will not be actually included in the public replication, they are not included in the Dataverse. 

## Notes on Stochastic Scripts and Commercial LLMs

Scripts with names starting from `51_` onward have not been included in the `run.sh` routine. There are two main reasons for this decision:

1. **Commercial LLM Dependencies:** The omitted scripts rely on commercial Large Language Models (LLMs) that require API keys to operate.

2. **Stochastic Outputs:** These scripts produce outputs that are stochastic in nature (i.e., they generate slightly different results on each execution). To maintain the consistency and reproducibility of the reported results, only the original data generated by these scripts is provided. The script(s) used to generate those outputs are included for reference.

## Computational Environment

### Software Environment
- **Operating System:** macOS (e.g., macOS Ventura 13.x)  
- **Programming Languages and Versions:**
  - **R:** Version 4.3.1  
    - Key packages (with versions as used): tidyverse, ggplot2, etc.
  - **Python:** Version 3.10.6  
    - Key packages (with versions as used): pandas, numpy, matplotlib, etc.

### Hardware Specifications
All analyses were conducted on the following hardware:
- **Model Name:** MacBook Pro  
- **Model Identifier:** Mac15,9  
- **Chip:** Apple M3 Max  
- **Total Number of Cores:** 16 (12 performance and 4 efficiency)  
- **Memory:** 64 GB  
- **Disk Space:** (Ensure at least *[estimate]* GB of free space for data and intermediate files)
- **Operating System Version:** (Specify the macOS version, e.g., macOS Ventura 13.x)
